Investigating Redundancy in Emoji Use: Study on a Twitter Based Corpus
نویسندگان
چکیده
In this paper we present an annotated corpus created with the aim of analyzing the informative behaviour of emoji – an issue of importance for sentiment analysis and natural language processing. The corpus consists of 2475 tweets all containing at least one emoji, which has been annotated using one of the three possible classes: Redundant, Non Redundant, and Non Redundant + POS. We explain how the corpus was collected, describe the annotation procedure and the interface developed for the task. We provide an analysis of the corpus, considering also possible predictive features, discuss the problematic aspects of the annotation, and suggest future improvements.
منابع مشابه
EmojiNet: An Open Service and API for Emoji Sense Discovery
This paper presents the release of EmojiNet, the largest machine-readable emoji sense inventory that links Unicode emoji representations to their English meanings extracted from the Web. EmojiNet is a dataset consisting of: (i) 12,904 sense labels over 2,389 emoji, which were extracted from the web and linked to machine-readable sense definitions seen in BabelNet; (ii) context words associated ...
متن کاملEmoji as Emotion Tags for Tweets
In many natural language processing tasks, supervised machine learning approaches have proved most effective, and substantial effort has been made into collecting and annotating corpora for building such models. Emotion detection from text is no exception; however, research in this area is in its relative infancy, and few emotion annotated corpora exist to date. A further issue regarding the de...
متن کاملSignals Revealing Street Gang Members on Twitter
We study the problem of automatically finding gang member profiles on Twitter. We outline a process to curate one of the largest sets of verifiable gang member profiles that has ever been studied. A review of these profiles establishes differences in the language, images, YouTube links, and emoji features gang members use compared to the rest of the Twitter population. We generate word embeddin...
متن کاملEmoticons vs. Emojis on Twitter: A Causal Inference Approach
Online writing lacks the non-verbal cues present in face-toface communication, which provide additional contextual information about the utterance, such as the speaker’s intention or affective state. To fill this void, a number of orthographic features, such as emoticons, expressive lengthening, and non-standard punctuation, have become popular in social media services including Twitter and Ins...
متن کاملEmotion Analysis of Twitter Data That Use Emoticons and Emoji Ideograms
Twitter is an online social networking service on which users worldwide publish their opinions on a variety of topics, discuss current issues, complain, and express many kinds of emotions. Therefore, Twitter is a rich source of data for opinion mining, sentiment and emotion analysis. This paper focuses on this issue by analysing symbols called emotion tokens, including emotion symbols (e.g. emo...
متن کامل